Implement Clustering and App related functions #4407

naiming-zededa · 2024-10-30T18:44:25Z

change VMI to VMI ReplicaSet for kubernetes
change Pod to Pod RelicaSet for containers
change functions handling replicaset names in services
subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster
add Designated Node ID to several structs for App
not to delete domain from kubernetes if not a Designated App node
parse config for EdgeNodeClusterConfig in zedagent
handle ENClusterAppStatus publication in zedmanger in multi-node clustering case
zedmanager handling effective-activation include ENClusterAppStatus
kubevirt hypervisor changes to handle VMI/Pod ReplicaSets

pkg/pillar/cmd/domainmgr/domainmgr.go

eriknordmark · 2024-11-07T09:56:53Z

pkg/pillar/cmd/domainmgr/domainmgr.go

-			}
-			if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
-				log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
+			// in cluster mode, we can not delete the pod due to failing to get app info


Was the issue this is fixing something appearing when there is a failover/takeover and another node in the cluster starts running the app instance?
Or is it something which could happen when an app instance is first provisioned on the cluster?

This can happen even on the first node of the app deployment, sometimes we can not get the status from the k3s cluster, or somethings it takes time to come up running state, but we should not remove this kubernetes configuration, it has the config stored in the database, it has it's own scheduling and control process to eventually bring it to the intended state. If we delete the config from the cluster, then we need to wait for another 10 minutes to retry, etc. and it will cause confusing.

so, a new boolean is introduced in the domainstatus, DomainConfigDeleted, allow the Designated node, if it knows for sure the app instance is removed from the device, then it can go ahead to delete the app/domain from the cluster.

It would be good to capture the above explanation either in a comment here or in pkg/pillar/docs/zedkube.md

Sure. I documented this in zedkube.md, and referenced from domainmgr.go

Hmm - re-reading it and it still looks odd.
Does the kubevirt Info() return an error or does it return HALTED when the issue is merely that it can't (yet) fetch the status from k3s?
Can we not fix that Info() to return something more accurate?

If it returns an error or HALTED then this function will set an error, and that error might be propagated to the controller, which would be confusing if the task is slow at starting, or is starting on some other node.

So I think this check is in the wrong place.

current code in kubevirt.go logic is that

if not found this app, then it will return status "", and an error logError("getVMIStatus: No VMI %s found with the given nodeName %s", repVmiName, nodeName)

if found this app on another node in cluster, then it returns status "nolocal", no error

if found on this node, then return whatever the kubernetes app running status

with above condition, if error is returned, then status is set to types.Broken
we further map the above status in string with a mapping:

// Use few states for now var stateMap = map[string]types.SwState{ "Paused": types.PAUSED, "Running": types.RUNNING, "NonLocal": types.RUNNING, "shutdown": types.HALTING, "suspended": types.PAUSED, "Pending": types.PENDING, "Scheduling": types.SCHEDULING, "Failed": types.FAILED, }

this is a good point of the error condition if not running will be confusing. I can change the condition 1) above to 'Scheduling', and it's a currently defined state, and sort of reflecting the kubernetes app status.

pkg/pillar/kubeapi/kubeapi.go

pkg/pillar/cmd/zedagent/parseconfig.go

naiming-zededa · 2024-12-10T05:30:40Z

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

codecov · 2024-12-10T05:34:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.90%. Comparing base (a73c78d) to head (2453125).
Report is 45 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4407      +/-   ##
==========================================
- Coverage   20.93%   20.90%   -0.03%     
==========================================
  Files          13       13              
  Lines        2895     2894       -1     
==========================================
- Hits          606      605       -1     
  Misses       2163     2163              
  Partials      126      126

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

christoph-zededa · 2024-12-10T10:44:31Z

pkg/pillar/cmd/domainmgr/domainmgr.go

@@ -3603,3 +3703,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) {
 	}
 	return &capabilities, nil
 }
+
+func getnodeNameAndUUID(ctx *domainContext) error {


Suggested change

func getnodeNameAndUUID(ctx *domainContext) error {

func (ctx *domainContext) retrieveNodeNameAndUUID(ctx *domainContext) error {

as this is not a getter method

christoph-zededa · 2024-12-10T10:50:07Z

pkg/pillar/hypervisor/kubevirt.go

+
+	// Add pod non-image volume disks
+	if len(diskStatusList) > 1 {
+		leng := len(diskStatusList) - 1


g stands for?

changed to 'length'.

christoph-zededa · 2024-12-10T10:51:26Z

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

I don't see any tests.

naiming-zededa · 2024-12-11T00:54:56Z

I have rebased and resolved the conflicts. Please review and see if the PR is ok.

I don't see any tests.

Added hypervisor/kubevirt_test.go now.

- change VMI to VMI ReplicaSet for kubernetes - change Pod to Pod RelicaSet for containers - change functions handling replicaset names in services - subscribe EdgeNodeInfo in domainmgr, zedmanager to get node-name for cluster - add Designated Node ID to several structs for App - not to delete domain from kubernetes if not a Designated App node - parse config for EdgeNodeClusterConfig in zedagent - handle ENClusterAppStatus publication in zedmanger in multi-node clustering case - zedmanager handling effective-activation include ENClusterAppStatus - kubevirt hypervisor changes to handle VMI/Pod ReplicaSets Signed-off-by: Naiming Shen <[email protected]>

eriknordmark · 2024-12-17T17:04:13Z

pkg/pillar/cmd/domainmgr/domainmgr.go

@@ -3603,3 +3709,16 @@ func lookupCapabilities(ctx *domainContext) (*types.Capabilities, error) {
 	}
 	return &capabilities, nil
 }
+
+func (ctx *domainContext) retrieveNodeNameAndUUID() error {


This is our own nodes nodename, right?
Can't it retrieve that at domainmgr startup?

And why is "UUID" part of the name of the function?

This part I used to at the startup of domainmgr, the 'waitEdgeNodeInfo' only wait for max of 5 minutes, so i could not relied on this. In previous review of this PR, you pointed out this, and I removed it. So, good point, I should also remove this function here.
I used to have UUID in the function() in previous code before this PR, will remove that.

eriknordmark · 2024-12-17T17:13:02Z

pkg/pillar/cmd/domainmgr/domainmgr.go

-			if err := hyper.Task(status).Cleanup(status.DomainName); err != nil {
-				log.Errorf("failed to cleanup domain: %s (%v)", status.DomainName, err)
+			// in cluster mode, we can not delete the pod due to failing to get app info
+			if !ctx.hvTypeKube {


And I get the feeling that most or all of these hvTypeKube checks should not be in domainmgr but he inside the kubevirt hypervisor package.

If for instance zedmanager is telling domainmgr to shut down or delete a task and if for kubevirt this isn't needed except on some designated node, can't that check whether designated or not be done inside the hypervisor/kubevirt code?

I can move this into the hypervisor functions, and I think mainly we need to pass in if this Domain config is being deleted or not (DomainStatus.DomainConfigDeleted), so, we need to change the API from '.Delete(domainName string)' into '.Delete(status *types.DomainStatus)' (that is going to change for other hypervisors)

Did you move it there? If so these this change and the preceeding comment can be removed from verifyStatus(), right?

eriknordmark · 2024-12-17T17:15:59Z

pkg/pillar/cmd/zedagent/parseconfig.go

+	if zcfgCluster == nil {
+		log.Functionf("parseEdgeNodeClusterConfig: No EdgeNodeClusterConfig, Unpublishing")
+		pub := ctx.pubEdgeNodeClusterConfig
+		items := pub.GetAll()


Is there only a global key for this? If so there is no need to call GetAll - just call Unpublish("global")

Otherwise call GetAll and walk the set of returned objects and unpublish each of their keys.

pkg/pillar/cmd/zedagent/parseconfig.go

zedi-pramodh · 2024-12-25T18:12:54Z

pkg/pillar/cmd/zedmanager/handleclusterapp.go

+	handleENClusterAppStatusImpl(ctx, key, nil)
+}
+
+func handleENClusterAppStatusImpl(ctx *zedmanagerContext, key string, status *types.ENClusterAppStatus) {


This function should be updated from our POC branch. Or I can do it as part of my PR for failover handling.

Actually I suggest amending this commit with latest handleclusterapp.go and applogs.go from POC branch.
In that way they can be reviewed as a unit and my follow up PRs need not include those files.

zedi-pramodh · 2024-12-25T18:16:19Z

pkg/pillar/cmd/zedkube/applogs.go

@@ -154,7 +154,7 @@ func (z *zedkube) checkAppsStatus() {
 	}

 	pub := z.pubENClusterAppStatus
-	stItmes := pub.GetAll()
+	stItems := pub.GetAll()


Please update this function from the POC branch, it changed a lot.

@zedi-pramodh as discussed, since some of the types boolean changes, update the poc code to applog.go would need to update many other files. I'll leave to your later PR to add those changes.

ok no problem, I will take care of those and submit my PR once this one is merged.

- changed the function to retrieveDeviceNodeName() and call it only at the start of domainmgr Run() - remove the ctx.hvTypeKube and status.IsDNidNode checks in the domainmgr.go code; also remove the status.DomainConfigDeleted. we now rely on normal domain handling of delete/cleanup work flow - fixed a bug where nodeName with underscore, which is not allowed in kubernetes names - changed the zedmanager/handleclusterstatus.go code to PoC code base, and commented out one line for later PR to handle - implemented the scheme when kubevirt can not contact kubernetes API-server or the cluster does not have the POD/VMI being scheduled yet, we return the 'Unknown' status now. It keeps a starting 'unknown' timestamp per application - also if the 'unknown' status lasts longer than 5 minutes, it changes into 'Halt' status back to domainmgr - updated 'zedkube.md' section 'Handle Domain Apps Status in domainmgr' for the above behavior Signed-off-by: Naiming Shen <[email protected]>

eriknordmark · 2024-12-28T20:42:29Z

pkg/pillar/cmd/domainmgr/domainmgr.go

+	}
+	enInfo := NodeInfo.(types.EdgeNodeInfo)
+	ctx.nodeName = strings.ReplaceAll(strings.ToLower(enInfo.DeviceName), "_", "-")
+	log.Noticef("retrieveDeviceNodeName: devicename, NodeInfo %v", NodeInfo) // XXX


Why the XXX comment?

eriknordmark · 2024-12-28T20:45:08Z

pkg/pillar/cmd/domainmgr/domainmgr.go

+	// Get the EdgeNode info, needed for kubevirt clustering
+	err = domainCtx.retrieveDeviceNodeName()
+	if err != nil {
+		log.Fatal(err)


You can't fatal here since you only wait if hvTypeKube.
How about making Eden/Adam send the EdgeNodeInfo and wait even if not hvTypeKube?
That would remove some special cases like this issue.

eriknordmark · 2024-12-28T20:54:21Z

pkg/pillar/cmd/zedagent/zedagent.go

@@ -1103,6 +1106,16 @@ func initPublications(zedagentCtx *zedagentContext) {
 	}
 	getconfigCtx.pubZedAgentStatus.ClearRestarted()

+	zedagentCtx.pubEdgeNodeClusterConfig, err = ps.NewPublication(pubsub.PublicationOptions{
+		AgentName:  agentName,
+		Persistent: true,


Why does this need to be persistent? That makes it much more painful to add fields to the EdgeNodeClusterConfig in future EVE releases.

eriknordmark · 2024-12-28T20:57:11Z

pkg/pillar/cmd/zedmanager/handledomainmgr.go

+
+	isDNiDnode := false
+	if aiConfig.DesignatedNodeID != uuid.Nil && aiConfig.DesignatedNodeID == ctx.nodeUUID {
+		isDNiDnode = true


Should this be set to true on a single node cluster? For a non-kubevirt case?
It would make sense to add some comment about the expectations in those cases.

eriknordmark · 2024-12-28T21:02:24Z

pkg/pillar/cmd/zedmanager/zedmanager.go

+		st, _ := sub.Get(key)
+		if st != nil {
+			clusterStatus := st.(types.ENClusterAppStatus)
+			if !clusterStatus.ScheduledOnThisNode {


No publishing seems odd.

I can understand that we want the controller to ignore Info messages (and perhaps also Metrics messages) for app instances which are not scheduled on this node, but it would make more sense to either put that check in the controller (which might require a some new fields in the Info and Metrics protobuf messages) or put the code in zedagent to exclude sending Info and Metrics for such an app instance. But publish here for other agents to see a Status for each Config.

eriknordmark · 2024-12-28T21:04:42Z

pkg/pillar/cmd/zedmanager/zedmanager.go

@@ -1563,6 +1617,14 @@ func updateBasedOnProfile(ctx *zedmanagerContext, oldProfile string) {
 }

 // returns effective Activate status based on Activate from app instance config and current profile


Comment needs to be updated.

eriknordmark · 2024-12-28T21:05:53Z

pkg/pillar/cmd/zedmanager/zedmanager.go

+	}
+
+	if ctx.nodeUUID == uuid.Nil {
+		err := getnodeNameAndUUID(ctx)


Please fetch this when zedmanager starts and not here.

eriknordmark · 2024-12-28T21:07:27Z

pkg/pillar/kubeapi/kubeapi.go

@@ -282,8 +282,8 @@ func waitForNodeReady(client *kubernetes.Clientset, readyCh chan bool, devUUID s
 			if err != nil {
 				return err
 			}
-			if len(pods.Items) < 6 {
-				return fmt.Errorf("kubevirt running pods less than 6")
+			if len(pods.Items) < 4 { // need at least 4 pods to be running


Where does the magic number 4 come from? Please expand on comment.

eriknordmark · 2024-12-28T21:10:02Z

pkg/pillar/hypervisor/kubevirt.go

@@ -1378,3 +1707,33 @@ func (ctx kubevirtContext) VirtualTPMTerminate(domainName string, wp *types.Watc
 func (ctx kubevirtContext) VirtualTPMTeardown(domainName string, wp *types.WatchdogParam) error {
 	return fmt.Errorf("not implemented")
 }
+
+func getMyNodeUUID(ctx *kubevirtContext, nodeName string) {


Reads more like a "set" or "save" function than a "get" function.

eriknordmark · 2024-12-28T21:11:05Z

pkg/pillar/docs/zedkube.md

+
+### Handle Domain Apps Status in domainmgr
+
+When the application is launched and managed in KubeVirt mode, the Kubernetes cluster is provisioned for this application, being a VMI (Virtual Machine Instance) replicaSet object or a Pod replicaSet object. It uses a declarative approach to manage the desired state of the applications. The configurations are saved in the Kubernetes database for the Kubernetes controller to use to ensure the objects eventually achieve the correct state if possible. Any particular VMI/Pod state of a domain may not be in working condition at the time when EVE domainmgr checks. In the domainmgr code running in KubeVirt mode, if it can not contact the Kubernetes API server to query about the application, or if the application itself has not be started yet in the cluster, the kubervirt.go will return the 'Unknown' status back. It will keep a 'Unknown' status starting timestamp per application. If the 'Unknown' status lasts longer then 5 minutes, the status functions in kubevirt.go will return 'Halt' status back to domainmgr. The timestamp will be cleared once it can get the application status from the kubernetes.


Reading the code it looks like it will return "Halting" and not "Halt" status.

eriknordmark · 2024-12-28T21:14:32Z

pkg/pillar/cmd/domainmgr/domainmgr.go

@@ -1138,6 +1175,11 @@ func maybeRetryBoot(ctx *domainContext, status *types.DomainStatus) {
 		log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err)
 	}

+	// pass nodeName to hypervisor call Setup
+	if status.NodeName == "" {


Can't NodeName be set when DomainStatus is created in handleCreate?

eriknordmark · 2024-12-28T21:14:50Z

pkg/pillar/cmd/domainmgr/domainmgr.go

@@ -1684,6 +1726,11 @@ func doActivate(ctx *domainContext, config types.DomainConfig,
 		log.Errorf("Failed to setup vTPM for %s: %s", status.DomainName, err)
 	}

+	// pass nodeName to hypervisor call Setup
+	if status.NodeName == "" {


naiming-zededa requested review from OhmSpectator, rene, rouming, milan-zededa and eriknordmark as code owners October 30, 2024 18:44

github-actions bot requested review from christoph-zededa, jsfakian, rucoder, shjala and uncleDecart October 30, 2024 18:44

naiming-zededa force-pushed the naiming-cluster-hypervisor branch 4 times, most recently from 631cc70 to 1aed95e Compare November 2, 2024 18:50

eriknordmark reviewed Nov 7, 2024

View reviewed changes

pkg/pillar/cmd/domainmgr/domainmgr.go Outdated Show resolved Hide resolved

eriknordmark reviewed Nov 7, 2024

View reviewed changes

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 1aed95e to 2e3e102 Compare November 7, 2024 18:50

github-actions bot requested a review from eriknordmark November 7, 2024 18:51

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 2e3e102 to 8a13e80 Compare November 7, 2024 21:24

milan-zededa reviewed Dec 2, 2024

View reviewed changes

pkg/pillar/kubeapi/kubeapi.go Outdated Show resolved Hide resolved

milan-zededa reviewed Dec 2, 2024

View reviewed changes

pkg/pillar/cmd/zedagent/parseconfig.go Outdated Show resolved Hide resolved

milan-zededa reviewed Dec 2, 2024

View reviewed changes

pkg/pillar/cmd/zedagent/parseconfig.go Outdated Show resolved Hide resolved

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 8a13e80 to b7140d2 Compare December 3, 2024 19:11

github-actions bot requested a review from milan-zededa December 3, 2024 19:11

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from b7140d2 to 2453125 Compare December 10, 2024 05:17

christoph-zededa reviewed Dec 10, 2024

View reviewed changes

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 2453125 to acd9962 Compare December 11, 2024 00:50

github-actions bot requested a review from christoph-zededa December 11, 2024 00:51

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from acd9962 to 1129c28 Compare December 11, 2024 06:24

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 1129c28 to 7db2529 Compare December 13, 2024 02:54

eriknordmark reviewed Dec 17, 2024

View reviewed changes

pkg/pillar/cmd/zedagent/parseconfig.go Show resolved Hide resolved

zedi-pramodh reviewed Dec 25, 2024

View reviewed changes

github-actions bot requested a review from eriknordmark December 28, 2024 01:55

naiming-zededa force-pushed the naiming-cluster-hypervisor branch from 3e80fde to ee49db4 Compare December 28, 2024 01:58

eriknordmark reviewed Dec 28, 2024

View reviewed changes

	func getnodeNameAndUUID(ctx *domainContext) error {
	func (ctx domainContext) retrieveNodeNameAndUUID(ctx domainContext) error {

		@@ -1563,6 +1617,14 @@ func updateBasedOnProfile(ctx *zedmanagerContext, oldProfile string) {
		}

		// returns effective Activate status based on Activate from app instance config and current profile


		### Handle Domain Apps Status in domainmgr

		When the application is launched and managed in KubeVirt mode, the Kubernetes cluster is provisioned for this application, being a VMI (Virtual Machine Instance) replicaSet object or a Pod replicaSet object. It uses a declarative approach to manage the desired state of the applications. The configurations are saved in the Kubernetes database for the Kubernetes controller to use to ensure the objects eventually achieve the correct state if possible. Any particular VMI/Pod state of a domain may not be in working condition at the time when EVE domainmgr checks. In the domainmgr code running in KubeVirt mode, if it can not contact the Kubernetes API server to query about the application, or if the application itself has not be started yet in the cluster, the kubervirt.go will return the 'Unknown' status back. It will keep a 'Unknown' status starting timestamp per application. If the 'Unknown' status lasts longer then 5 minutes, the status functions in kubevirt.go will return 'Halt' status back to domainmgr. The timestamp will be cleared once it can get the application status from the kubernetes.

Implement Clustering and App related functions #4407

Are you sure you want to change the base?

Implement Clustering and App related functions #4407

Conversation

naiming-zededa commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naiming-zededa commented Dec 10, 2024

codecov bot commented Dec 10, 2024

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christoph-zededa commented Dec 10, 2024

naiming-zededa commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naiming-zededa Nov 7, 2024 •

edited

Loading